Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTML input support #313

Merged
merged 7 commits into from
Feb 12, 2020
Merged

Add HTML input support #313

merged 7 commits into from
Feb 12, 2020

Conversation

fsteeg
Copy link
Member

@fsteeg fsteeg commented Jan 23, 2020

Parse HTML with jsoup, write XML. See example in test.

See #312

Opening as draft pull request for some initial discussion, I suggest we only merge when we successfully used this in our full use case. In particular, config options and output format are to be determined.

@dr0i: I've set this up as a separate Gradle project, mostly because it adds the jsoup dependency, and because it fits with the overall structure that we have. What do you think?

Parse HTML with jsoup, write XML. See example in test.

See #312
To use with decode-xml, but how to test?

See #312
With `decode-html` flux command

See #312
Set generated record ID, only process content of leaf nodes

See #312
@fsteeg fsteeg changed the title Basic HtmlReader with html-to-xml flux command Add HTML input support Feb 7, 2020
@fsteeg
Copy link
Member Author

fsteeg commented Feb 7, 2020

With the (functionally reviewed) scenarios in hbz/oerindex#2 and hbz/oerindex#3, this could now resolve #312. It also contains 47a5ba7, which resolves #314.

@fsteeg fsteeg marked this pull request as ready for review February 7, 2020 10:58
@fsteeg fsteeg requested a review from dr0i February 7, 2020 10:58
@fsteeg fsteeg merged commit 3c9998c into master Feb 12, 2020
@fsteeg fsteeg mentioned this pull request Mar 16, 2020
@dr0i dr0i deleted the 312-html branch April 14, 2020 12:27
blackwinter added a commit that referenced this pull request Dec 13, 2024
…Field

Change `copy_field()`/`move_field()` Fix functions to destructive behaviour for Catmandu compatibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants